Westmorland County
- North America > Canada > Quebec (0.04)
- North America > United States > Wisconsin (0.04)
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
SmolRGPT: Efficient Spatial Reasoning for Warehouse Environments with 600M Parameters
Traore, Abdarahmane, Hervet, Éric, Couturier, Andy
Recent advances in vision-language models (VLMs) have enabled powerful multimodal reasoning, but state-of-the-art approaches typically rely on extremely large models with prohibitive computational and memory requirements. This makes their deployment challenging in resource-constrained environments such as warehouses, robotics, and industrial applications, where both efficiency and robust spatial understanding are critical. In this work, we present SmolRGPT, a compact vision-language architecture that explicitly incorporates region-level spatial reasoning by integrating both RGB and depth cues. Smol-RGPT employs a three-stage curriculum that progressively align visual and language features, enables spatial relationship understanding, and adapts to task-specific datasets. W e demonstrate that with only 600M parameters, SmolRGPT achieves competitive results on challenging warehouse spatial reasoning benchmarks, matching or exceeding the performance of much larger alternatives.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
TrajFusionNet: Pedestrian Crossing Intention Prediction via Fusion of Sequential and Visual Trajectory Representations
Landry, François G., Akhloufi, Moulay A.
--With the introduction of vehicles with autonomous capabilities on public roads, predicting pedestrian crossing intention has emerged as an active area of research. The task of predicting pedestrian crossing intention involves determining whether pedestrians in the scene are likely to cross the road or not. In this work, we propose TrajFusionNet, a novel transformer-based model that combines future pedestrian trajectory and vehicle speed predictions as priors for predicting crossing intention. TrajFusionNet comprises two branches: a Sequence Attention Module (SAM) and a Visual Attention Module (V AM). The SAM branch learns from a sequential representation of the observed and predicted pedestrian trajectory and vehicle speed. Complementarily, the V AM branch enables learning from a visual representation of the predicted pedestrian trajectory by overlaying predicted pedestrian bounding boxes onto scene images. By utilizing a small number of lightweight modalities, TrajFusionNet achieves the lowest total inference time (including model runtime and data preprocessing) among current state-of-the-art approaches. In terms of performance, it achieves state-of-the-art results across the three most commonly used datasets for pedestrian crossing intention prediction.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- Africa > Mauritania > Hodh El Gharbi > Aioun (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- North America > Canada > Quebec (0.04)
- North America > United States > Wisconsin (0.04)
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- (2 more...)
Heart2Mind: Human-Centered Contestable Psychiatric Disorder Diagnosis System using Wearable ECG Monitors
Nguyen, Hung, Rahimi, Alireza, Whitford, Veronica, Fournier, Hélène, Kondratova, Irina, Richard, René, Cao, Hung
Psychiatric disorders affect millions globally, yet their diagnosis faces significant challenges in clinical practice due to subjective assessments and accessibility concerns, leading to potential delays in treatment. To help address this issue, we present Heart2Mind, a human-centered contestable psychiatric disorder diagnosis system using wearable electrocardiogram (ECG) monitors. Our approach leverages cardiac biomarkers, particularly heart rate variability (HRV) and R-R intervals (RRI) time series, as objective indicators of autonomic dysfunction in psychiatric conditions. The system comprises three key components: (1) a Cardiac Monitoring Interface (CMI) for real-time data acquisition from Polar H9/H10 devices; (2) a Multi-Scale Temporal-Frequency Transformer (MSTFT) that processes RRI time series through integrated time-frequency domain analysis; (3) a Contestable Diagnosis Interface (CDI) combining Self-Adversarial Explanations (SAEs) with contestable Large Language Models (LLMs). Our MSTFT achieves 91.7% accuracy on the HRV-ACC dataset using leave-one-out cross-validation, outperforming state-of-the-art methods. SAEs successfully detect inconsistencies in model predictions by comparing attention-based and gradient-based explanations, while LLMs enable clinicians to validate correct predictions and contest erroneous ones. This work demonstrates the feasibility of combining wearable technology with Explainable Artificial Intelligence (XAI) and contestable LLMs to create a transparent, contestable system for psychiatric diagnosis that maintains clinical oversight while leveraging advanced AI capabilities. Our implementation is publicly available at: https://github.com/Analytics-Everywhere-Lab/heart2mind.
- North America > Canada > New Brunswick > York County > Fredericton (0.14)
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- (5 more...)
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.45)
- Research Report > Experimental Study (0.45)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.67)
- Government > Regional Government > North America Government > United States Government > FDA (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Violence detection in videos using deep recurrent and convolutional neural networks
Traoré, Abdarahmane, Akhloufi, Moulay A.
Violence and abnormal behavior detection research have known an increase of interest in recent years, due mainly to a rise in crimes in large cities worldwide. In this work, we propose a deep learning architecture for violence detection which combines both recurrent neural networks (RNNs) and 2-dimensional convolutional neural networks (2D CNN). In addition to video frames, we use optical flow computed using the captured sequences. CNN extracts spatial characteristics in each frame, while RNN extracts temporal characteristics. The use of optical flow allows to encode the movements in the scenes. The proposed approaches reach the same level as the state-of-the-art techniques and sometime surpass them. It was validated on 3 databases achieving good results.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Taiwan > Taiwan > Taipei (0.04)
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Saito, Kuniaki, Sohn, Kihyuk, Lee, Chen-Yu, Ushiku, Yoshitaka
Large language models require updates to remain up-to-date or adapt to new domains by fine-tuning them with new documents. One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt. However, LLMs suffer from a phenomenon called "perplexity curse"; despite minimizing document perplexity during fine-tuning, LLMs struggle to extract information through a prompt sentence. In this new knowledge acquisition and extraction, we find a very intriguing fact that LLMs can accurately answer questions about the first sentence, but they struggle to extract information described in the middle or end of the documents used for fine-tuning. Our study suggests that the auto-regressive training causes this issue; each token is prompted by reliance on all previous tokens, which hinders the model from recalling information from training documents by question prompts. To conduct the in-depth study, we publish both synthetic and real datasets, enabling the evaluation of the QA performance w.r.t. the position of the corresponding answer in a document. Our investigation shows that even a large model suffers from the "perplexity curse", but regularization such as denoising auto-regressive loss can enhance the information extraction from diverse positions. These findings will be (i) a key to improving knowledge extraction from LLMs and (ii) new elements to discuss the trade-off between RAG and fine-tuning in adapting LLMs to a new domain.
- North America > United States > Montana (0.04)
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Media > Film (0.93)
- Leisure & Entertainment (0.93)
Visualization of Extremely Sparse Contingency Table by Taxicab Correspondence Analysis: A Case Study of Textual Data
We present an overview of taxicab correspondence analysis, a robust variant of correspondence analysis, for visualization of extremely sparse ontingency tables. In particular we visualize an extremely sparse textual data set of size 590 by 8265 concerning fragments of 8 sacred books recently introduced by Sah and Fokou\'e (2019) and studied quite in detail by (12 + 1) dimension reduction methods (t-SNE, UMAP, PHATE,...) by Ma, Sun and Zou (2022).
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- Europe > France (0.04)
- Asia > India (0.04)
- Asia > China > Tibet Autonomous Region (0.04)
Computing linear sections of varieties: quantum entanglement, tensor decompositions and beyond
Johnston, Nathaniel, Lovitz, Benjamin, Vijayaraghavan, Aravindan
We study the problem of finding elements in the intersection of an arbitrary conic variety in $\mathbb{F}^n$ with a given linear subspace (where $\mathbb{F}$ can be the real or complex field). This problem captures a rich family of algorithmic problems under different choices of the variety. The special case of the variety consisting of rank-1 matrices already has strong connections to central problems in different areas like quantum information theory and tensor decompositions. This problem is known to be NP-hard in the worst case, even for the variety of rank-1 matrices. Surprisingly, despite these hardness results we develop an algorithm that solves this problem efficiently for "typical" subspaces. Here, the subspace $U \subseteq \mathbb{F}^n$ is chosen generically of a certain dimension, potentially with some generic elements of the variety contained in it. Our main result is a guarantee that our algorithm recovers all the elements of $U$ that lie in the variety, under some mild non-degeneracy assumptions on the variety. As corollaries, we obtain the following new results: $\bullet$ Polynomial time algorithms for several entangled subspaces problems in quantum entanglement, including determining r-entanglement, complete entanglement, and genuine entanglement of a subspace. While all of these problems are NP-hard in the worst case, our algorithm solves them in polynomial time for generic subspaces of dimension up to a constant multiple of the maximum possible. $\bullet$ Uniqueness results and polynomial time algorithmic guarantees for generic instances of a broad class of low-rank decomposition problems that go beyond tensor decompositions. Here, we recover a decomposition of the form $\sum_{i=1}^R v_i \otimes w_i$, where the $v_i$ are elements of the variety $X$. This implies new uniqueness results and genericity guarantees even in the special case of tensor decompositions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
MalBERT: Using Transformers for Cybersecurity and Malicious Software Detection
Rahali, Abir, Akhloufi, Moulay A.
In recent years we have witnessed an increase in cyber threats and malicious software attacks on different platforms with important consequences to persons and businesses. It has become critical to find automated machine learning techniques to proactively defend against malware. Transformers, a category of attention-based deep learning techniques, have recently shown impressive results in solving different tasks mainly related to the field of Natural Language Processing (NLP). In this paper, we propose the use of a Transformers' architecture to automatically detect malicious software. We propose a model based on BERT (Bidirectional Encoder Representations from Transformers) which performs a static analysis on the source code of Android applications using preprocessed features to characterize existing malware and classify it into different representative malware categories. The obtained results are promising and show the high performance obtained by Transformer-based models for malicious software detection.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > New Brunswick > Westmorland County > Moncton (0.04)
- Africa > Sudan (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.50)